5.1 - Goal - Learning a Combat Skill - Kiting

This chapter marks a transition from macroeconomic tasks to micro-management. Our third agent will be trained to master a fundamental combat tactic known as kiting or "stutter-stepping."

The goal is to move beyond resource management and teach an agent to make optimal, split-second decisions in a dynamic, adversarial combat scenario.

The Desired Behavior- A State-Based Kiting Policy

Kiting is a policy for a ranged unit to defeat a faster melee attacker. The agent must learn to switch between "Attack" and "Move" states based on its weapon cooldown and its distance to the target.

The Ideal Policy (Finite State Machine):

+------------------+     Target is in range     +-------------------+
|                  | -------------------------> |                   |
|  State:          |                            |  State:           |
|  MOVING TOWARDS  | <------------------------- |  ATTACKING        |
|  TARGET          |    Target is out of range  |                   |
|                  |                            |                   |
+--------+---------+                            +---------+---------+
         |                                                 |
         | Target is in range                              | Weapon is on cooldown
         |                                                 |
         +---------------------v---------------------------+
                               |
                   +-----------v-----------+
                   |                       |
                   | State:                |
                   | KITING (MOVING AWAY)  |
                   |                       |
                   +-----------^-----------+
                               |
                               | Weapon cooldown is over
                               |
                               +

Success Criteria

The agent (a single Marine) must learn to consistently defeat a single, faster Zergling.
The agent's policy must correctly utilize weapon_cooldown to decide when to move.
The agent's policy must correctly utilize target_in_range to decide when to attack or reposition.

Controlled Scenario Setup

To isolate the kiting task, we will use debug commands to create a perfect, repeatable 1v1 scenario at the start of each episode.

Spawn one friendly Marine for the agent to control.
Spawn one enemy Zergling at a fixed distance.
Immediately remove all starting worker units.

Environment Design

1. Observation Space Specification

The observation vector is designed to provide all critical information for a 1v1 combat engagement.

Gymnasium Type: gymnasium.spaces.Box
Shape: (5,)

Index	Feature	Rationale
`0`	`marine.health_percentage`	"How much danger am I in?"
`1`	`marine.weapon_cooldown > 0`	(Key Signal) "Is my weapon ready to fire?"
`2`	`zergling.health_percentage`	"How close am I to winning?"
`3`	`marine.distance_to(zergling)`	"Is my positioning optimal?"
`4`	`marine.target_in_range`	(Key Signal) "Is attacking a valid move from this position?"

2. Action Space Specification

The agent is given a set of discrete combat maneuvers.

Gymnasium Type: gymnasium.spaces.Discrete
Size: 3

Action	Agent's Intent
`0`	Attack Target: Engage the enemy.
`1`	Move Away: Increase distance (the "kite" action).
`2`	Move Towards: Decrease distance (the "chase" action).

3. Reward Function Specification

The reward must directly incentivize combat efficiency.

Design Philosophy:

Health Differential: The primary learning signal is the change in relative health between the two units. This directly rewards dealing damage while avoiding taking damage.
Large Terminal Reward: A definitive win/loss signal at the end of the episode provides a clear, final objective.

Reward Calculation (on each step): reward = (previous_zergling_hp - current_zergling_hp) - (previous_marine_hp - current_marine_hp)

This creates a zero-sum reward where positive values mean the agent won the trade, and negative values mean it lost the trade. A large terminal reward (+100 for a win, -100 for a loss) is added at the end.

The Desired Behavior- A State-Based Kiting Policy​

Success Criteria​

Controlled Scenario Setup​

Environment Design​

The Desired Behavior- A State-Based Kiting Policy

Success Criteria

Controlled Scenario Setup

Environment Design